338 research outputs found

    Dualities in tree representations

    Get PDF
    A characterization of the tree T∗ such that BP(T∗) = ↔ DFUDS(T), the reversal of DFUDS(T) is given. An immediate consequence is a rigorous characterization of the tree T such that BP( T^) = DFUDS(T^). In summary, BP and DFUDS are unified within an encompassing framework, which might have the potential to imply future simplifications with regard to queries in BP and/or DFUDS. Immediate benefits displayed here are to identify so far unnoted commonalities in most recent work on the Range Minimum Query problem, and to provide improvements for the Minimum Length Interval Query problem

    Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina)

    Get PDF
    Genomic data for wild species of the genus Bubalus (Asian buffaloes) are still lacking while several whole genomes are currently available for domestic water buffaloes. To address this, we sequenced the genome of a wild endangered dwarf buffalo, the lowland anoa (Bubalus depressicornis), produced a draft genome assembly, and made comparison to published buffalo genomes. The lowland anoa genome assembly was 2.56 Gbp long and contained 103,135 contigs, the longest contig being 337.39 kbp long. N50 and L50 values were 38.73 kbp and 19.83 kbp, respectively, mean coverage was 44x and GC content was 41.74%. Two strategies were adopted to evaluate genome completeness: (i) determination of genomic features with de novo and homology-based predictions using annotations of chromosome-level genome assembly of the river buffalo, and (ii) employment of benchmarking against universal single-copy orthologs (BUSCO). Homology-based predictions identified 94.51% complete and 3.65% partial genomic features. De novo gene predictions identified 32,393 genes, representing 97.14% of the reference's annotated genes, whilst BUSCO search against the mammalian orthologues database identified 71.1% complete, 11.7% fragmented and 17.2% missing orthologues, indicating a good level of completeness for downstream analyses. Repeat analyses indicated that the lowland anoa genome contains 42.12% of repetitive regions. The genome assembly of the lowland anoa is expected to contribute to comparative genome analyses among bovid species. [Abstract copyright: © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.

    Using cascading Bloom filters to improve the memory usage for de Brujin graphs

    Get PDF
    De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte

    Safe and complete contig assembly via omnitigs

    Full text link
    Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph GG (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from GG as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

    A framework for space-efficient string kernels

    Full text link
    String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the kk-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in O(nd)O(nd) time and in o(n)o(n) bits of space in addition to the input, using just a rangeDistinct\mathtt{rangeDistinct} data structure on the Burrows-Wheeler transform of the input strings, which takes O(d)O(d) time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of kk, like the kk-mer profile and the kk-th order empirical entropy, and for calibrating the value of kk using the data

    STRONG: metagenomics strain resolution on assembly graphs

    Get PDF
    We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads

    Consequences of breed formation on patterns of genomic diversity and differentiation: the case of highly diverse peripheral Iberian cattle

    Get PDF
    Iberian primitive breeds exhibit a remarkable phenotypic diversity over a very limited geographical space. While genomic data are accumulating for most commercial cattle, it is still lacking for these primitive breeds. Whole genome data is key to understand the consequences of historic breed formation and the putative role of earlier admixture events in the observed diversity patterns.info:eu-repo/semantics/publishedVersio

    The value of the spineless monkey orange tree (Strychnos madagascariensis) for conservation of northern sportive lemurs (Lepilemur milanoii and L. ankaranensis)

    Get PDF
    Tree hollows provide shelters for a large number of forest-dependent vertebrate species worldwide. In  Madagascar, where high historical and ongoing rates of deforestation and forest degradation are  responsible for a major environmental crisis, reduced availability of tree hollows may lead to declines in hollow-dwelling species such as sportive lemurs, one of the most species-rich groups of lemurs. The identification of native tree species used by hollow-dwelling lemurs may facilitate targeted management interventions to maintain or improve habitat quality for these lemurs. During an extensive survey of sportive lemurs in northern Madagascar, we identified one tree species, Strychnos madagascariensis (Loganiaceae), the spineless monkey orange tree, as a principal sleeping site of two species of northern sportive lemurs, Lepilemur ankaranensis and L. milanoii (Lepilemuridae). This tree species represented 32.5% (n=150) of the 458 sleeping sites recorded. This result suggests that S. madagascariensis may be valuable for the conservation of hollow-dwelling lemurs. De nombreux vertĂ©brĂ©s forestiers Ă  travers le monde trouvent refuge dans des cavitĂ©s et des trous  d’arbres. À Madagascar, les taux de dĂ©forestation historiques et actuels sont responsables d’une crise environnementale majeure. Dans ce contexte, une disponibilitĂ© rĂ©duite d’arbres pourvus de cavitĂ©s  pourrait entrainer le dĂ©clin des espĂšces dĂ©pendant de ces abris comme par exemple les lĂ©pilemurs, un des groupes de lĂ©muriens les plus riches en espĂšces. L’identification des espĂšces d’arbres indigĂšnes creusĂ©s de trous et utilisĂ©s par les lĂ©muriens pourrait faciliter la mise en place d’actions de conservation ayant pour but de maintenir ou amĂ©liorer l’habitat de ces lĂ©muriens. Au cours d’une Ă©tude rĂ©alisĂ©e dans le Nord de Madagascar, nous avons observĂ© que Strychnos madagascariensis  (Loganiaceae) Ă©tait   frĂ©quemment utilisĂ© comme site dortoir par les deux espĂšces de lĂ©pilemurs prĂ©sentes, Lepilemur   ankaranensis and L. milanoii (Lepilemuridae). Cette espĂšce d’arbre concernait 32,5% (n = 150) des 458  sites dortoirs enregistrĂ©s. Ce rĂ©sultat suggĂšre que S. madagascariensis pourrait ĂȘtre important pour la  conservation des lĂ©muriens dĂ©pendant de sites dortoirs
    • 

    corecore